DualBird cloud-native data & AI infra engine built under Spark - Iceberg sort-based compaction use case

July 1, 2026

Iceberg compaction is a compute-intensive operations in modern data lakehouses, placing significant pressure on CPU, memory, and shuffle resources while driving up infrastructure costs. As data volumes continue to grow, these maintenance workloads increasingly become a bottleneck for both performance and scalability.

This white paper evaluates DualBird's cloud-native hardware acceleration engine on a 100 GB sort-based Iceberg compaction benchmark, comparing it against both vanilla Apache Spark and state-of-the-art C++ accelerated Spark. We describe the benchmark methodology, workload characteristics, and system architecture, then analyze performance, cost, and scalability across multiple cluster configurations.

Our results demonstrate up to 12–20× faster Spark task execution, 55–85% lower EC2 costs, a ~6× higher practical performance ceiling, and significantly reduced operational complexity through the elimination of most Spark tuning. The findings show how transparent hardware acceleration can fundamentally improve the execution of compute-intensive Spark workloads without requiring application changes or pipeline rewrites.

Ready to see the full benchmark?

Download the complete white paper for the benchmark methodology, system architecture, and detailed performance and cost analysis.

‍

Get white paper

Transform your data infrastructure performance with a few clicks

Zero risk, zero effort, incredible results.

Learn about our vision